Obtaining Calibrated Probabilities from Boosting

نویسندگان

  • Alexandru Niculescu-Mizil
  • Rich Caruana
چکیده

Boosted decision trees typically yield good accuracy, precision, and ROC area. However, because the outputs from boosting are not well calibrated posterior probabilities, boosting yields poor squared error and cross-entropy. We empirically demonstrate why AdaBoost predicts distorted probabilities and examine three calibration methods for correcting this distortion: Platt Scaling, Isotonic Regression, and Logistic Correction. We also experiment with boosting using log-loss instead of the usual exponential loss. Experiments show that Logistic Correction and boosting with log-loss work well when boosting weak models such as decision stumps, but yield poor performance when boosting more complex models such as full decision trees. Platt Scaling and Isotonic Regression, however, significantly improve the probabilities predicted by both boosted stumps and boosted trees. After calibration, boosted full decision trees predict better probabilities than other learning methods such as SVMs, neural nets, bagged decision trees, and KNNs, even after these methods are calibrated. Introduction In a recent evaluation of learning algorithms (Caruana & Niculescu-Mizil 2006), boosted decision trees had excellent performance on metrics such as accuracy, lift, area under the ROC curve, average precision, and precision/recall break even point. However, boosted decision trees had poor squared error and cross-entropy because AdaBoost does not produce good probability estimates. Friedman, Hastie, and Tibshirani (2000) provide an explanation for why boosting makes poorly calibrated predictions. They show that boosting can be viewed as an additive logistic regression model. A consequence of this is that the predictions made by boosting are trying to fit a logit of the true probabilities, as opposed to the true probabilities themselves. To get back the probabilities, the logit transformation must be inverted. In their treatment of boosting as a large margin classifier, Schapire et al. (1998) observed that in order to obtain large margin on cases close to the decision surface, AdaBoost will sacrifice the margin of the easier cases. This results in a shifting of the predicted values away from 0 and 1, hurting cali∗A slightly extended verison of this paper appeared in UAI‘05 Copyright c © 2007, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. bration. This shifting is also consistent with Breiman’s interpretation of boosting as an equalizer (see Breiman’s discussion in (Friedman, Hastie, & Tibshirani 2000)). In the next section we demonstrate this probability shifting on real data. To correct for boosting’s poor calibration, we experiment with boosting with log-loss, and with three methods for calibrating the predictions made by boosted models to convert them to well-calibrated posterior probabilities. The three post-training calibration methods are: Logistic Correction: a method based on Friedman et al.’s analysis of boosting as an additive model Platt Scaling: the method used by Platt to transform SVM outputs from [−∞,+∞] to posterior probabilities (1999) Isotonic Regression: the method used by Zadrozny and Elkan to calibrate predictions from boosted naive Bayes, SVM, and decision tree models (2002; 2001) Logistic Correction and Platt Scaling convert predictions to probabilities by transforming them with a sigmoid. With Logistic Correction, the sigmoid parameters are derived from Friedman et al.’s analysis. With Platt Scaling, the parameters are fitted to the data using gradient descent. Isotonic Regression is a general-purpose non-parametric calibration method that assumes probabilities are a monotonic transformation (not just sigmoid) of the predictions. An alternative to training boosted models with AdaBoost and then correcting their outputs via post-training calibration is to use a variant of boosting that directly optimizes crossentropy (log-loss). Collins, Schapire and Singer (2002) show that a boosting algorithm that optimizes log-loss can be obtained by simple modification to the AdaBoost algorithm. Collins et al. briefly evaluate this new algorithm on a synthetic data set, but acknowledge that a more thorough evaluation on real data sets is necessary. Lebanon and Lafferty (2001) show that Logistic Correction applied to boosting with exponential loss should behave similarly to boosting with log-loss, and then demonstrate this by examining the performance of boosted stumps on a variety of data sets. Our results confirm their findings for boosted stumps, and show the same effect for boosted trees. Our experiments show that boosting full decision trees usually yields better models than boosting weaker stumps. Unfortunately, our results also show that boosting to directly optimize log-loss, or applying Logistic Correction to models boosted with exponential loss, is only effective when boosting weak models such as stumps. Neither of these methods is

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Calibrated Boosting-Forest

Excellent ranking power along with well calibrated probability estimates are needed in many classification tasks. In this paper, we introduce a technique, Calibrated Boosting-Forest1 that captures both. This novel technique is an ensemble of gradient boosting machines that can support both continuous and binary labels. While offering superior ranking power over any individual regression or clas...

متن کامل

Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers

Accurate, well-calibrated estimates of class membership probabilities are needed in many supervised learning applications, in particular when a cost-sensitive decision must be made about examples with example-dependent costs. This paper presents simple but successful methods for obtaining calibrated probability estimates from decision tree and naive Bayesian classifiers. Using the large and cha...

متن کامل

Obtaining Accurate Probabilistic Causal Inference by Post-Processing Calibration

Discovery of an accurate causal Bayesian network structure from observational data can be useful in many areas of science. Often the discoveries are made under uncertainty, which can be expressed as probabilities. To guide the use of such discoveries, including directing further investigation, it is important that those probabilities be well-calibrated. In this paper, we introduce a novel frame...

متن کامل

Calibrated Structured Prediction

In user-facing applications, displaying calibrated confidence measures— probabilities that correspond to true frequency—can be as important as obtaining high accuracy. We are interested in calibration for structured prediction problems such as speech recognition, optical character recognition, and medical diagnosis. Structured prediction presents new challenges for calibration: the output space...

متن کامل

Obtaining Well Calibrated Probabilities Using Bayesian Binning

Learning probabilistic predictive models that are well calibrated is critical for many prediction and decision-making tasks in artificial intelligence. In this paper we present a new non-parametric calibration method called Bayesian Binning into Quantiles (BBQ) which addresses key limitations of existing calibration methods. The method post processes the output of a binary classification algori...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005